Search CORE

81 research outputs found

The Edit Distance Transducer in Action: The University of Cambridge English-German System at WMT16

Author: Byrne Bill
Hasler Eva
Stahlberg Felix
Publication venue: ACL 2016 First Conference On Machine Translation (WMT16)
Publication date: 01/01/2016
Field of study

This paper presents the University of Cambridge submission to WMT16. Motivated by the complementary nature of syntactical machine translation and neural machine translation (NMT), we exploit the synergies of Hiero and NMT in different combination schemes. Starting out with a simple neural lattice rescoring approach, we show that the Hiero lattices are often too narrow for NMT ensembles. Therefore, instead of a hard restriction of the NMT search space to the lattice, we propose to loosely couple NMT and Hiero by composition with a modified version of the edit distance transducer. The loose combination outperforms lattice rescoring, especially when using multiple NMT systems in an ensemble

arXiv.org e-Print Archive

Crossref

Apollo (Cambridge)

Dynamic topic adaptation for improved contextual modelling in statistical machine translation

Author: Hasler Eva Cornelia
Publication venue: The University of Edinburgh
Publication date: 29/06/2015
Field of study

In recent years there has been an increased interest in domain adaptation techniques for statistical machine translation (SMT) to deal with the growing amount of data from different sources. Topic modelling techniques applied to SMT are closely related to the field of domain adaptation but more flexible in dealing with unstructured text. Topic models can capture latent structure in texts and are therefore particularly suitable for modelling structure in between and beyond corpus boundaries, which are often arbitrary. In this thesis, the main focus is on dynamic translation model adaptation to texts of unknown origin, which is a typical scenario for an online MT engine translating web documents. We introduce a new bilingual topic model for SMT that takes the entire document context into account and for the first time directly estimates topic-dependent phrase translation probabilities in a Bayesian fashion. We demonstrate our model’s ability to improve over several domain adaptation baselines and further provide evidence for the advantages of bilingual topic modelling for SMT over the more common monolingual topic modelling. We also show improved performance when deriving further adapted translation features from the same model which measure different aspects of topical relatedness. We introduce another new topic model for SMT which exploits the distributional nature of phrase pair meaning by modelling topic distributions over phrase pairs using their distributional profiles. Using this model, we explore combinations of local and global contextual information and demonstrate the usefulness of different levels of contextual information, which had not been previously examined for SMT. We also show that combining this model with a topic model trained at the document-level further improves performance. Our dynamic topic adaptation approach performs competitively in comparison with two supervised domain-adapted systems. Finally, we shed light on the relationship between domain adaptation and topic adaptation and propose to combine multi-domain adaptation and topic adaptation in a framework that entails automatic prediction of domain labels at the document level. We show that while each technique provides complementary benefits to the overall performance, there is an amount of overlap between domain and topic adaptation. This can be exploited to build systems that require less adaptation effort at runtime

Edinburgh Research Archive

Recommended from our members

Neural Machine Translation Decoding with Terminology Constraints

Author: Byrne WJ
de Gspert Adrià
Hasler eva
Iglesias Gonzalo
Publication venue: Association for Computational Linguistics
Publication date: 10/09/2018
Field of study

Despite the impressive quality improvements yielded by neural machine translation (NMT) systems, controlling their translation output to adhere to user-provided terminology con- straints remains an open problem. We describe our approach to constrained neural decod- ing based on finite-state machines and multi- stack decoding which supports target-side con- straints as well as constraints with correspond- ing aligned input text spans. We demonstrate the performance of our framework on multiple translation tasks and motivate the need for constrained decoding with attentions as a means of reducing misplacement and duplication when translating user constraints

Apollo (Cambridge)

Sparse lexicalised features and topic adaptation for SMT

Author: Haddow Barry
Hasler Eva
Koehn Philipp
Publication venue
Publication date: 01/01/2012
Field of study

Edinburgh Research Explorer

Dynamic Topic Adaptation for SMT using Distributional Profiles

Author: Haddow Barry
Hasler Eva
Koehn Philipp
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/06/2014
Field of study

Edinburgh Research Explorer

Combining Domain and Topic Adaptation for SMT

Author: Haddow Barry
Hasler Eva
Koehn Philipp
Publication venue
Publication date: 01/01/2014
Field of study

Edinburgh Research Explorer

Dynamic Topic Adaptation for Phrase-based MT

Author: Blunsom Phil
Haddow Barry
Hasler Eva
Koehn Philipp
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2014
Field of study

Translating text from diverse sources poses a challenge to current machine translation systems which are rarely adapted to structure beyond corpus level. We explore topic adaptation on a diverse data set and present a new bilingual vari-ant of Latent Dirichlet Allocation to com-pute topic-adapted, probabilistic phrase translation features. We dynamically in-fer document-specific translation proba-bilities for test sets of unknown origin, thereby capturing the effects of document context on phrase translations. We show gains of up to 1.26 BLEU over the base-line and 1.04 over a domain adaptation benchmark. We further provide an anal-ysis of the domain-specific data and show additive gains of our model in combination with other types of topic-adapted features.

CiteSeerX

Crossref

Edinburgh Research Explorer